The influence of Chunking on Dependency Crossing and Distance

نویسندگان

  • Qian Lu
  • Chunshan Xu
  • Haitao Liu
چکیده

This paper hypothesizes that chunking plays important role in reducing dependency distance and dependency crossings. Computer simulations, when compared with natural languages, show that chunking reduces mean dependency distance (MDD) of a linear sequence of nodes (constrained by continuity or projectivity) to that of natural languages. More interestingly, chunking alone brings about less dependency crossings as well, though having failed to reduce them, to such rarity as found in human languages. These results suggest that chunking may play a vital role in the minimization of dependency distance, and a somewhat contributing role in the rarity of dependency crossing. In addition, the results point to a possibility that the rarity of dependency crossings is not a mere side-effect of minimization of dependency distance, but a linguistic phenomenon with its own motivations. Introduction. – Language used in communication is invariably presented linearly, one unit after another, which is regarded as one of its fundamental property [1]. However, there is always a sytactic tree structure underlying a onedimensional linear sentence, a structure underpinning both the production and the comprehension of this sentence [2,3]. Therefore, language processing consists, to a considerable degree, in the transformation between the syntactic tree structure and the one-dimensional linear arrangement. What properties can be found in the tree structure of language? What mechanisms constrain the transformation of tree structure into linear structure? The answers to these questions, which may well require researches based on statistical physics and computer simulation, probably will shed much light on how human language operates. In terms of dependency grammar, the structure of a sentence can be visualized as a hierarchical dependency tree, whose nodes (vertices) are words, linked to one another by directed edges (dependency relations) [2,3]. Such a hierarchical tree must be ultimately arranged into a linear sequence, for the purpose of spoken and written communication. So far, researches have repeatedly observed two phenomena in the linear realization of hierarchical dependency structure: the minimization of dependency distance (the number of intervening words) between two syntactically related words [4-13], and the rarity of crossing dependency relations [14,15]. Liu [5] has compared dependency distance of 20 natural languages with that of two different random languages, and pointed out that dependency distance minimization seems to be universal in human languages. Ferrer-i-Cancho has theoretically analyzed these [8,9]. A recent study based on 37 languages has obtained similar findings[11]. Since dependency distance is held as cognitively related to language processing load [16], the minimization of dependency distance is probably a result of the principle of least effort [17]. In addition, it is argued that that the rarity of crossing dependencies is simply a by-product of the pressure to minimize dependency distance and cognitive cost in language processing, having little to do with the syntax of the language [7-10]. Similarly, some studies find that dependency distance will significantly increase if dependency crossings are permitted, and suggests that reducing dependency crossings is probably an important means to restrain dependency distance [4,5]. Dependency distance and crossings are closely related, and in human languages both seem to be subject to minimization. Ferrer-i-Cancho [9,10] has theoretically proven that, for sufficiently short dependency lengths, the probability that two edges cross decreases as their length decreases. However, Liu has found that projective random language (i.e. without any crossing dependency) has significantly longer mean dependency distance than natural langauage [4,5]. Therefore,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical study of heat gradient and crossing energy to around building walls containing phase change materials in Kashan temperature conditions

The application of phase change materials (PCM) in the variant parts of building because of the high capacity these materials, lead to an improvement in temperature conditions and reduction in energy consumption. Due to the high dependency of the performance of these materials to the ambient temperature fluctuations, their applications in climates with extreme temperature fluctuations has a sig...

متن کامل

Structure Alignment Using Bilingual Chunking

A new statistical method called “bilingual chunking” for structure alignment is proposed. Different with the existing approaches which align hierarchical structures like sub-trees, our method conducts alignment on chunks. The alignment is finished through a simultaneous bilingual chunking algorithm. Using the constrains of chunk correspondence between source language (SL)1 and target language (...

متن کامل

P45: The Effects of Nigella sativa on Sickness Behavior Induced by Lipopolysaccharide in Male Wistar Rats

Neuroimmune factors contribute on the pathogenesis of sickness behaviors. Nigella sativa (NS) has anti-inflammatory, anti-anxiety and anti-depressive effects. In the present study, the effect of NS hydro-alcoholic extract on sickness behavior induced by lipopolysaccharide (LPS) was investigated. The rats were divided into five groups (n=10 in each): (1) control (saline), (2) LPS (1 mg/kg, admin...

متن کامل

Learning Dependency Relations of Japanese Compound Functional Expressions

This paper proposes an approach of processing Japanese compound functional expressions by identifying them and analyzing their dependency relations through a machine learning technique. First, we formalize the task of identifying Japanese compound functional expressions in a text as a machine learning based chunking problem. Next, against the results of identifying compound functional expressio...

متن کامل

A Unified Single Scan Algorithm for Japanese Base Phrase Chunking and Dependency Parsing

We describe an algorithm for Japanese analysis that does both base phrase chunking and dependency parsing simultaneously in linear-time with a single scan of a sentence. In this paper, we show a pseudo code of the algorithm and evaluate its performance empirically on the Kyoto University Corpus. Experimental results show that the proposed algorithm with the voted perceptron yields reasonably go...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1509.01310  شماره 

صفحات  -

تاریخ انتشار 2015